=semiconductors =manufacturing =computers
Can SRAM be stacked like NAND flash?
background
types of memory
SRAM has,
for each bit, multiple transistors that connect power supplies to the gates
of each other. Typically 6 transistors are used, but many
variations have been proposed.
DRAM uses transistors to connect
capacitors to an input/output line when they're written or read. The
capacitors lose charge and must be refreshed periodically and when read.
Flash memory uses high voltage to push charge across an insulating
layer, where it stays in place indefinitely. The stored-charge electrostatic
field is combined with fields from wires to control transistors, 1
transistor per bit. They have a limited cycle life because the insulator
layers can get damaged from charge transfer thru them.
SRAM cell sizes
Smaller
transistors can carry less current than larger ones. When the size of wires
is decreased, their capacitance decreases less than their diameter, so if
length and switching speed is constant, the required current is similar.
Smaller wires also have higher resistance; at current CPU wire sizes,
conductivity is more than linear with cross-section area. With constant
current, voltage drop then increases, which for the same length requires
more "repeaters" along the path.
The area required to store 1 bit in
SRAM is called the "cell size". Basically for the above reasons, SRAM cell
sizes
haven't decreased much for a few years.
layer counts
The "nm"
number of process nodes
no longer
corresponds to any feature sizes. The current meaning of "X nm node"
seems to be something like "the transistor density is similar to what a
planar transistor process would have at X nm".
Yet, transistor
counts have continued increasing. The only explanation, then, is more
layers. Of course, that increases power usage proportionately without
increasing area for heat dissipation, so a smaller fraction of transistors
can be active at once.
That means performance per cost doesn't
increase. Note that cost per transistor
stopped going
down
after 28nm. Also, a few layers of transistors and wires isn't even close to
the number of layers in modern flash memory.
silicon interposers
Historically, CPUs have been a single layer, with transistors on the CPU
face side connected to contacts on the motherboard, and cooling on the CPU
back side. The current trend is towards chiplets put on a silicon
"interposer" layer.
Adding an extra semiconductor layer adds costs,
so it must have some justification over alternatives. Vs a larger CPU,
chiplets with problems can be discarded individually, which makes higher
layer counts practical. They also have more modularity and thus design
flexibility. Vs separate chips on PCBs, interposers can have much smaller
wires, and can route signals around with transistors.
Apart from the
extra silicon layer needed, interposers also need small holes
(through-silicon vias = TSVs) to connect the chiplets on their face side to
the motherboard on their back side. Making small holes thru silicon without
causing other damage is hard, and narrower holes are harder to make.
high-bandwidth memory (HBM)
DRAM chiplets put directly on silicon interposer are called "HBM". If
you can make TSVs, then you can stack multiple DRAM chiplets on top of each
other, and run signals vertically thru them, decreasing signal travel
distance. Chipmakers are now starting production of 12-layer HBM stacks.
Why stack DRAM on top of other DRAM? Why not stack DRAM on top of logic
to reduce distances more? Logic chiplets use more power and reach higher
temperatures that are bad for DRAM.
Why not stack SRAM caches like
DRAM, since that also has lower power usage than logic? Compared to SRAM
caches, the bandwidth of HBM is low and the latency is high; it's only
"high-bandwidth" compared to sticks of DRAM connected by PCIe. That makes
the advantages of SRAM over DRAM mostly irrelevant.
3d flash memory
Most flash
memory made today is vertical
NAND flash.
In that, current flows thru a whole stack of transistors. 64 layers used to
be typical; now, people are making 128-layer memory commercially.
my question
Given that
stacking 64+ layers of flash memory is practical, why isn't DRAM or SRAM
stacked like that? Increasing density would reduce signal travel distance on
CPUs.
2tb of flash memory now costs about as much as 64gb of DRAM. If
SRAM could be made like flash memory with 10x the size per bit, it would
still be cheaper than DRAM. Why not do that?
stacked DRAM
Current DRAM
uses capacitors that are cylindrical with height > width. There are some
long-term plans to put the capacitors sideways and stack a lot of DRAM
layers, but they're long-term plans because it's not considered economically
practical now.
Also, DRAM typically uses smaller feature sizes than
3d flash, which makes stacking somewhat harder. The power usage of flash
memory is lower, so cost per transistor is more important, and at this point
newer nodes are more expensive per transistor. Also, larger flash memory
cells makes storing multiple levels easier; 3-bit (8-level) flash memory is
now standard, but that doesn't work very well for DRAM.
stacked SRAM
Suppose we want to replace DRAM chiplets with
stacked SRAM that's made like flash memory. What prevents that from being
done? Obviously SRAM has a more complex structure, but with
photolithography, the complexity of patterns (of the same elements, at the
same scale) is irrelevant.
Looking at their
structures, the only relevant thing SRAM has that flash doesn't seems to be
wires crossing over each other, which requires connections between small
horizontal and vertical wires. Well,
here's a video
that goes into more detail about the fabrication process of 3d flash memory.
Basically, many thin layers are stacked, deep trenches/holes are etched in
that, and stuff is deposited in them.
So, can SRAM cells be
redesigned so they can be fabricated by the methods used for vertical flash
memory? Sort of. Here's an example
paper of people trying to do that; they estimate that density matches
current SRAM at ~10 layers...which I think makes it too expensive to compete
with DRAM while being impractical to integrate in logic chiplets. Well, I
was thinking a bit about how vertical SRAM could be implemented, and found
something that seems fairly practical. You just need to have a separate
chiplet that can be processed differently and think outside the
conjoined triangles
a bit.